[SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string#28393
[SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string#28393adrian-wang wants to merge 4 commits intoapache:masterfrom
Conversation
|
Test build #121991 has finished for PR 28393 at commit
|
| @@ -519,13 +520,13 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { | |||
| for (index <- 0 until line.length) { | |||
| if (line.charAt(index) == '\'' && !insideComment) { | |||
| // take a look to see if it is escaped | |||
There was a problem hiding this comment.
@adrian-wang Should we update the comment to reflect the newly added condition?
| insideSingleQuote = !insideSingleQuote | ||
| } | ||
| } else if (line.charAt(index) == '\"' && !insideComment) { | ||
| // take a look to see if it is escaped |
xuanyuanking
left a comment
There was a problem hiding this comment.
LGTM for the changes, cc @cloud-fan
| @@ -519,13 +520,13 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { | |||
| for (index <- 0 until line.length) { | |||
| if (line.charAt(index) == '\'' && !insideComment) { | |||
| // take a look to see if it is escaped | |||
|
@dilipbiswal @xuanyuanking Thanks for your advice, I have updated the code. |
|
Test build #122169 has finished for PR 28393 at commit
|
| ) | ||
| } | ||
|
|
||
| test("Should allow unescaped quote mark in quoted string") { |
There was a problem hiding this comment.
Plz add a prefix SPARK-31595: Should allow....
|
LGTM |
|
Test build #122206 has finished for PR 28393 at commit
|
|
|
||
| test("SPARK-31595 Should allow unescaped quote mark in quoted string") { | ||
| runCliWithin(1.minute)( | ||
| """SELECT '"legal string a';select 1 + 234;""".stripMargin -> "235" |
There was a problem hiding this comment.
Is this an only issue in the thrift server side? How about the spark side?
scala> sql("""SELECT '"legal string a';select 1 + 234;""").show()
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 25)
== SQL ==
SELECT '"legal string a';select 1 + 234;
-------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:268)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:135)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49)
There was a problem hiding this comment.
This is not the same case. sql() only accepts single sql statement, even
sql("select 1; select 2;")will return error.
| } | ||
|
|
||
| test("SPARK-31595 Should allow unescaped quote mark in quoted string") { | ||
| runCliWithin(1.minute)( |
There was a problem hiding this comment.
Could you explicitly set false at spark.sql.parser.escapedStringLiterals for the tests below?
There was a problem hiding this comment.
Actually this has nothing to do with spark.sql.parser.escapedStringLiterals, even if the config is set to true, the parser should accept this string.
There was a problem hiding this comment.
Ur, I got it. I think the option is misleading... Could you remove it from the PR descritpion?
There was a problem hiding this comment.
Done that, thanks!
maropu
left a comment
There was a problem hiding this comment.
Looks fine and thanks for the update, @adrian-wang cc: @wangyum
| """SELECT '"legal string a';select 1 + 234;""".stripMargin -> "235" | ||
| ) | ||
| runCliWithin(1.minute)( | ||
| """SELECT "legal 'string b";select 22222 + 1;""".stripMargin -> "22223" |
There was a problem hiding this comment.
nit: let's not use the multiline string style for a single line string.
There was a problem hiding this comment.
Updated, thanks!
|
Test build #122329 has finished for PR 28393 at commit
|
|
cc @juliuszsompolski as well |
|
thanks, merging to master/3.0! |
…ted string ### What changes were proposed in this pull request? `def splitSemiColon` cannot handle unescaped quote mark like "'" or '"' correctly. When there are unmatched quotes in a string, `splitSemiColon` will not drop off semicolon as expected. ### Why are the changes needed? Some regex expression will use quote mark in string. We should process semicolon correctly. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added Unit test and also manual test. Closes #28393 from adrian-wang/unescaped. Authored-by: Daoyuan Wang <me@daoyuan.wang> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 53a9bf8) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
def splitSemiColoncannot handle unescaped quote mark like "'" or '"' correctly. When there are unmatched quotes in a string,splitSemiColonwill not drop off semicolon as expected.Why are the changes needed?
Some regex expression will use quote mark in string. We should process semicolon correctly.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added Unit test and also manual test.